DATA HANDLING (1) 


Learning Outcomes and Assessment Standards 


Learning Outcome 4: Data handling and probability 
Assessment Standard AS 1(a) 
Calculate and represent measures of central tendency and dispersion in univariate 
numerical data by: 
five number summary 
box and whisker diagrams 
ogives 
variance and standard deviation. 


Overview 


In this lesson you will: 


e Revise the terms mean, median, mode and quartiles 
e Learn about five number summaries 
e Learn about box and whisker plots. 


Lesson 


Revision of Grade 10 concepts 


The mean 


The mean of a set of data is defined as X (x bar). 


sumofthevalues  _ 2x 
number of the values n 


X= 
Example 

Calculate the mean of the following data: 

12, 13, 13, 15, 16, 16, 16, 16, 16, 17, 17, 18, 18, 18, 18 


K = 2 = 124134+13+154+16+16+16+16+16+17417+418+18+18+18 = 23? = 15,9 


The mode 


The mode is the most commonly occurring observation. 


Example 


ClassA  |1 {1 |1 |2 |4 {5 |7 |9 |10 
ClasB  /|1 1 41 |/2 |4 |5 [5 {5 [5 |7 |7 |8 |8 |8 |9 |10 


For Class A, the mode is 1. For Class B, the mode is 5. 


Consider the following set of marks for a Class C: 


ClassC fo [1 |1 [2 [2 [2 [3 [4 [4 |4 [5 [5 [6 |7 [9 |10 


There are two modes in this set of data: 2 and 4 (they appear the same number of 
times and are the most frequently occurring marks. The data is said to be bimodal. 


Quartiles 


Quartiles are measures of dispersion (or spread) around the median, which is a better 
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measure of central tendency. The median divides the data into two halves. The quar- 
tiles further subdivide the data into quarters. 


There are therefore three quartiles: 


The lower quartile (Q,): This is the median of the lower half of the values. We 
also call this the 25" percentile. 


The median (Q_): The value that divides the data into halves. We also call 
this the 50" percentile. 


The upper quartile(Q,): This is the median of the upper half of the values. We 
also call this the 75‘ percentile. 


Useful formulae to determine the position of the quartiles are: 
The lower quartile (q, ): Hn +1) 
The median (q Ae 
1 
3(N +1) 
The upper quartile(q ,): 3(n +1) 


Example 1 (Odd number of values) 


Note: 

If the number of values (1) in the data set is odd, the median will always be part of the 
data set. 

To find the median we use (244) 

The lower and upper quartiles will be part of the data set if a(n +1) and 


3(n + 1) work out to be whole numbers. 


The lower and upper quartiles will not be part of the data set if a(n + 1) and 


3(n + 1) do not work out to be whole numbers. 
(a) Consider the following set of marks obtained on a class test out of 10 marks. 
The number of marks is odd. 
2 2 3 4 5 5 6 7 7 8 9 
Lower quartile Median Upper quartile 
Q, Q, Q, 


The position of Q, = 3(1 1+ 1)=6. 

The median of the data is 5 (the 6th value). 

The position of Q, = 4(1 1+1)=3 

The lower quartile of the data is 3 (the 3% value). It is a part of the data set. 
The position of Q, = 3(1 1+1)=9 

The upper quartile of the data is 7 (the 9th value). It is part of the data set. 


(b) Consider the following set of 13 marks obtained on a class test out of 10 
marks: 


2/3 /4/[5]5]5]6]/7]7] 8] 9 [| 10] 10 


The position of Q, = 31 3 + 1) = 7th position. 
The median of the data is the 7th value: 
Q,=6 


The position of Q, = a(13 + 1) = 3, 5th position (In the middle of point 3 and 
4) 


The lower quartile of the data is the average between the 3” and 4th value: 
Q,= 42 = 4,5 (not part of the data set) 

The position of Q, = 3113 + 1) = 10, 5th position 
The upper quartile of the data is the average between the 10th and 11th value: 
Q, = 842 = 8,5 (not part of the data set) 


2 4 (M5) 5 | 5 | 5 ey 7 | 7 | 8 (BSN 9 | 10 | 10 

Q Q Q 
Example 2 (Even number of values) oP Example 
Note: 


If n is even, the median will not be part of the data set. 


If n is even and = is even, the lower and upper quartiles will not be values in the data 
set. Round off the position values up or down to the nearest whole number. 


If n is even and 3 is odd, the lower and upper quartiles will be values in the data set. 


(a) Consider the following set of 12 marks obtained by a class on a class test out 
of 100 marks. The number of marks is even. 


20 |32 |/43 |54 |55 |61 |73 |78 |89 |90 |91 |98 


The position of Q, = 3(12 + 1) = 6,5 (average of the 6th and 7th value). 


The median of the data is Q, = Bi +e = 67 
Since n is even and since 5 7 12 = 6 which is even, the lower and upper quar- 


tiles will not be values in the data set. 


The position of q, = i (12 + 1) = 3,25 (average of the 3% and 4th value). 


The lower quartile i the data is q, = 8 “Se 24 = 48,5 
The position of q, =7 3 (12 + 1) = 9,75 (average of the 9th and 10th value). 
The upper quartile the data is q, = =e = = 89,5 


20 | 32 | 43 [485) 54 | 55 | 61 | 67 | 73 | 78 | 89 |895) 90 | 91 | 98 


Lower quartile Median Upper quartile 
48,5 67 89,5 


(b) Consider the following set of 10 marks obtained by a class on a class test out 
of 150 marks. The number of marks is even. 


12 |60 {95 |105 |120 |125 | 130 | 135 | 140 | 142 (®) 


The position of Q, = 3(10 + 1) = 5,5th position (average of the 5th and 6th a, & 
value). 

The median of the data is Werle 122,5 SS; 

Since n is even and since 5 7 2 = 5 which is odd, the lower and upper quar- y \ 

tiles will be values in the data set. 7 

The position of Q, = 3(10 + 1) = 2,75 (Round up to the 3” value) w/ 
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The lower quartile of the data is 95 
The position of q, = 3(1 0+ 1)=8,25 (round down to the 8th value). 
The upper quartile of the data is 135 


125 |130 |135 | 140 | 142 


12 |60 |95 |105 120 [i225 


Interquartile range (iqr) 


The difference between the lower and upper quartile is called the Interquartile 
Range. It is a better measure of dispersion than the range because it is not affected 
by extreme values. It is based on the middle half of the data. It indicates how densely 
the data in the middle is spread around the median. Consider the previous example. 


125 130 |135 | 140 142 


12 |60 fm 105 |120 fie 


The Interquartile Range (IQR)= Q, - Q, = 135 - 95 = 40 


Semi-interquartile range 


The semi-interquartile range is half of the interquartile range. 
Q3-Q,_ 135-95 _ 40 _ 
7 = === 20. 


The semi-lQR for the previous example is 


2 2 
Activity 1 
1; For each set of data, determine the quartiles: 
A /2 {3 {5 {7 |9 |10}11/13]15/16]17|)18}19}21}22)23 |25)32 
B }2 13 |5 |7 19 |10}11})13]15)16}17|18]19}21 | 22/23 
C /2 13 15 |7 19 J10}11113}15}16)17}18)19}21} 22) 23 |25|32|34 
D}2 13 {5 |7 19 |10}11}13]15]16)17/18]19}21|22|23/}25 
2. Class results for a test out of 30 are recorded in the table below. 


10A |16}12]16} 11} 14] 15 |22|16}17}| 15 | 26 | 23 | 16 |22|16|17|24|19 
10B |20}19)14}10}14}9 |8 |13) 14} 30] 27 | 23 | 24|28)17)| 29) 20) 16) 14) 18 
10C |5 }20}14}12}7 |2 |12}21} 14) 26) 14] 14) 12) 14) 21 | 24) 14) 14 


a) Calculate the mean for each class. 
b) Calculate the mode for each class. 


Cc) Calculate the median for each class. 


e) Calculate the lower quartile for each class. 
f 


g) Calculate the interquartile range for each class. 


— 


( 

( 

( 

(d) Calculate the range for each class. 

( 

( Calculate the upper quartile for each class. 

( 

(h) Calculate the semi-interquartile range for each class. 


3 A teacher has recorded the test marks of forty grade 10 learners. The test was 
out of 10. Draw a frequency table and then calculate the mean, median and 
mode for this data. 


1 #|9 |10 |4 |7 |4 {4 |10 


A Nas WR ee 
7 13 |9 |4 |5 |8 |6 |6 
1 |3 |10 |2 12 [7 |8 |7 
7 \2 \7 |\6 |2 |8 |7 16 


Lesson G Lesson 


Five number summaries and box and whisker plots 


Five number summaries and box and whisker plots help us to represent and analyse 
the spread of data about the median. 


Five number summaries 
The five number summary uses the following measures of dispersion: 


Minimum: the smallest value in the data 

Lower quartile: the median of the lower half of the values 
Median: the value that divided the data into halves 
Upper quartile: the median of the upper half of the values 


e Maximum: the largest value in the data 


Box and whisker plots 


A box and whisker plot is a graphical representation of the five number summary. 


Box 
Whisker Whisker 


_——er 4 


Minimum Lower Quartile Median Upper Quartile Maximum 


Note: 


e Half of the values lie between the minimum value and the median. 
e Half of the values lie between the median and the maximum value. 


e One quarter of the values lies between the minimum value and the lower 
quartile. 


e One quarter of the values lies between the lower quartile and the 
median. 


e One quarter of the values lies between the median and the upper 
quartile. 


e One quarter of the values lies between the upper quartile and the 
maximum value. 


e Half of the values lie between the lower quartile and upper quartile. 


Example Example 


€3; 


Consider the following set of marks for a class test (out of 10) for three classes. 


CLASSA |1/1 42/2 |3 /3 |4 {4 {4 |6 |7 /8 |8 |9 |10 /10 |10 
CLASS B 1/2 |4/4)4/4 {51/7 /8 /8 |8 |8 |9 /9 |9 |10)10 
cLassc |1 [2 /3/3 [34/5/15 [5 lé |e |7 |7 |7 [8 |9 J10 a, 
For each class above, create a five number summary and hence a box and whisker @ 
plot. ou 
CLASSA [1] 1|2| 2 [25] 3/3]4]4 [@le] 7/8] 8 [BS] 9 [10] 10/10 > \ 
) 
J 4 
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Five number summary 


Minimum: 


1 


Lower quartile (Q.): Position of Q, = 4(1 7+1)=4,5 


Median (M or Q_): 


Maximum: 


Box and whisker plot oe ae 


(average of 4th and 5th value) 


-Q = 243 


1 


2 


=2,5 


Position of Q,= 3(1 7+1)=9 
(9th value) 


Q,=4 
Upper quartile (Q,): Position of Q, = 3 (17+ 1) = 13,5 


(average of 13th and 14th value) 


Q= 


3 
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Five number summary 


Minimum: 


Lower quartile (Q_): 


Median (M or Q_): 


Upper quartile (Q, ): 


1 


Position of Q, = 4 (17+ 1) =4,5 


(average of 4th and 5th value) 


. Ce 


2 


=4 


Position of Q,=4(17 +1)=9 


(9th value) 
.Q,=8 


Position of Q, = 3 (17+ 1) =13,5 
(average of 13th and 14th value) 


Q,=2t2=9 
Maximum: 10 
Min Q MorQ, Q,; Max 
t 14 
|e 
re box and whisker plot 
ciassc|1 |2 [3 |3 [BM)3 [4 [5 |s Bile Jo |7 |7 7 |s 10 


Five number summary 


Minimum: 


1 


Lower quartile (Q.): Position of Q, = j (17+ 1)=4,5 


(average of 4th and 5th value) 


Q,=2#3 53 


Median (M or Q_): Position of Q, = 4 (174+1)=9 
(9th value) 
~Q=5 

Upper quartile (Q,): Position of Q, = 3 (17 + 1) = 13,5 
(average of 13th and 14th value) 


- _74+7_ 
Q,= a 7 
Maximum: 10 
Min Q MorQ, Q; Max 
+ t t 


Box and whisker plot t+ { [| +; — 


0 1 3 4 ég 10 


Symmetrical and skewed data 


e Symmetrical data set (relative to the median) 


If the data to the left of the median balances with the data on the right, then the 
data is symmetrical about the median. 


Min Q MorQ, Q3 Max 


+ + + + + 
Consider, for example, CLASS C. Min, MorQ; Max 
e Skewed data (relative to the median) rtf. ou 

If the data is clustered predominantly to the right —;~~+—+-4-4-+-4+-+-+4-4 4 


of the median, the data is said to be skewed to 
the right. Consider, for example, CLASS A. 


If the data is clustered predominantly to the left of the median, the data is said to be 
skewed to the left. Consider, for example, CLASS B. 


Min Q MorQ, Q; Max 


Activity 2 FP Activity 


The number of points scored by four Formula One racing drivers over a number of 
races are given below: 


A j1 {1 1 |2 |6 |6 |8 |8 {8 |8 410 ;10 |10 
Bi/1 |2 |6 |8 (8 |8 {8 |8 |8 |10 |10 }10 |- 
C }1 |1 |]2 |2 |4 |4 |6 J6 [8 |8 |10 |- J- 
D|2 |2 |2 |4 |4 |6 |6 {8 |8 |10 |10 }10 |- 


(A) Calculate the mean for each of the drivers. 

(B) List the five number summary for each driver. 

(C) Draw a box and whisker plot for each driver. 

(d) Discuss each driver's distribution of scores in terms of the spread about the 
median. 


(e) Compare the performance results for each driver by using the information 
obtained, above. 


